Loudness Measurement of Multitrack Audio Content using Modifications of ITU-R BS.1770
نویسندگان
چکیده
The recent loudness measurement recommendations by the ITU and the EBU have gained widespread recognition in the broadcast community. The material it deals with is usually full-range mastered audio content, and its applicability to multitrack material is not yet clear. In the present work we investigate how well the evaluated perception of single track loudness agrees with the measured value as defined by ITU-R BS.1770. We analyze the underlying features that may be the cause for this disparity and propose some parameter alterations that might yield better results for multitrack material with minimal modification to their rating of broadcast content. The best parameter sets are then evaluated by a panel of experts in terms of how well they produce an equal-loudness multitrack mix, and are shown to be significantly more successful. 1. LOUDNESS MEASUREMENT Over the last decade there has been a significant amount of research on broadcast-related loudness perception and metering, a trend much inspired by the ITU efforts. This initiative led to recommendation ITU-R BS.1770 [1], later extended by EBU R128 recommendation [2]. Recent work [3], [4] has already treated loudness of multitrack materials according to BS.1770 / R128 loudness measurement recommendations with some level of success. It is not clear how well it can be applied to the task of individual sound source loudness judgment, since it was created for pre-mixed broadcast material. The authors have observed that this algorithm shows some consistent disagreements with perception through informal observations, and described initial results in [5]. Our observations inPestana et al. Analyzing the ITU Loudness Measurement Towards Better Evaluation of Narrow-band Material dicate that the loudness of percussive material with limited high-range spectral bandwidth (i.e: hi-hats, shakers, tambourines) is often underestimated by the algorithm. The loudness measurement recommendation we are investigating, outlined in [1], is a straightforward single band, level-independent system. The signal is passed through two biquad filters, termed the prefilter (a +4 dB high shelf at around 1681 Hz) and the RLB-filter (a hi-pass filter with a 38 Hz cutoff), before being squared and its level measured over a time-constant of 400 ms. For a more thorough explanation the interested reader is referred to the ITU and EBU documentation [1, 2]. A signal that is measured according to this recommendation has a value given in Loudness Units (LU), which are a logarithmic unit, similar to the decibel. In Section 2 we summarize the findings of the subjective listening test in [5] that proposed to reveal whether the discrepancy that was noted is indeed true for a diverse panel of individuals. We further explore the results under Section 3, looking for underlying features that might explain why certain types of single tracks are misjudged by the algorithm. In Section 4, some algorithm parameter tweaks are proposed that might provide better loudness measurement to a more diversified range of material. The effectiveness gain of the modifications is analyzed and, in Section 5, the most promising solutions are then evaluated by a panel of expert listeners. 2. SUBJECTIVE TESTING The tests described in [5] were performed at Luśıada University’s AudioLab and at an audio classroom at the Restart Institute in Lisbon. 40 subjects used professional studio-grade headphones, with full-range frequency specifications, through the exact same audio chain, calibrated so that it delivered 83 dBSPL measured with a dummy-head, a value that conforms to mixing recommendations (e.g. [6]), that suggest the listener should be at a medium equal loudness contour level. 1 Three professional sound engineers, fifteen final-year students in audio, and twenty two multimedia and music students with some (limited) exposure to audio engineering. The procedure was explained and the instructions given pre-test, and no one showed any doubt as to what was required. There was a previous ‘calibration’-type test aiming to understand what would be a good measure of whether the EBU R128 recommendation resonated universally with human perception. This test used broadband material (full mixes) and the results showed a strong consistency within subject and between subject and algorithm at the reference level, allowing the authors to proceed with the main question. The main test aimed at the evaluation of multi-track content. We had five songs split into individual tracks (each song had 9–11 different tracks). Subjects were given a fixed reference track and asked to alter the level of the remaining tracks until they sounded equally loud (as loud as the reference track) using a set of faders. All the tracks had previously been normalized to yield the same loudness, according to the algorithm, so if a subject set up all the faders at unity, it would mean perfect agreement between measurement and evaluation. If subjects change the level past unit or below unity, then there would be a loudness evaluation difference which we calculate and present as our main variable. It was emphasized that this was a loudness-matching task, given that the subjects were used to performing to a different mindset in their profession/studies. Many subjects admitted after completing the test that it was very hard for them to keep their focus on equal-loudness. Some songs in some examples were duplicated, so that we could further test for consistency. We have been guided by the concerns and methodology suggested by Bech and Zacharov [7], and particularly by the great care with which similar tests in Skovenborg et al [8] were elaborated. The test design did not allow the subject to use all tracks as reference, or else the test duration would become unwieldy. Our fixed references were the kick drum (results shown in Fig. 1) and the vocals (results shown in Fig. 2) on alternate examples. Both elements were previously equalized so that they had similar spectral content across all five songs. This did not guarantee by itself that they would elicit equal loudness perception, but differences in answers 2 Unity here does not imply that the faders were marked and scaled the same, it is merely our own hidden unity reference. Unity level was not always at the same fader position, and subjects were alerted of that fact, and told not to mix visually. AES 134 Convention, Rome, Italy, 2013 May 4–7 Page 2 of 9 Pestana et al. Analyzing the ITU Loudness Measurement Towards Better Evaluation of Narrow-band Material Vo ca ls 0 +2 +4 +6 +8 +10
منابع مشابه
Level and distrortion in digital broadcasting
The purpose of this article is to justify and recommend more fitting ways of measuring and controlling the audio level in digital broadcasting than looking at isolated samples or quasipeak levels. The new ITU-R BS.1770 standard, specifying long-term loudness and peak-level detection, is evaluated and a centre of gravity approach to loudness control is suggested. Metadata associated with Dolby A...
متن کاملLoudness Measurement and Control
Sudden changes in loudness because of loud commercials and variations in loudness levels between programs have been a major source of nuisance for the consumers. With the transition to Digital TV, loudness related issues have not only continued but are on the rise. Traditional techniques of loudness measurement are ineffective for digital broadcast. Techniques for digital broadcast need to be o...
متن کاملSpecifying Audio for HD
The numbers of broadcast channels and platforms are going up, while the number of viewers remains the same. Digitization was expected to help streamlining audio delivery, but this has not happened yet. Confusion about peak level, loudness, end-listener requirements, formats, and the generation of metadata, has made digital broadcast an obstacle rather than the simplification needed. This paper ...
متن کاملPartial Loudness in Multitrack Mixing
Partial loudness can be used as high-level, perceptually relevant metadata in the context of semantic audio, especially in multitrack mixtures or wherever masking scenarios are desired. Subjective evaluation of the partial loudness model of Glasberg and Moore on multitrack signals in the form of equal loudness matching experiment is presented. The observed results imply that the current model u...
متن کاملAudio Engineering Society
In recent years, the increasing popularity of portable media devices among consumers has created new and unique audio challenges for content creators, distributors as well as device manufacturers. Many of the latest devices are capable of supporting a broad range of content types and media formats including those often associated with high quality (wider dynamic-range) experiences such as HDTV,...
متن کامل